{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 与Campaigns（活动）进行互动 <a class=\"anchor\" id=\"top\"></a>\n",
    "\n",
    "在这个 Notebook 中，您将部署并与Amazon Personalize中的活动互动。\n",
    "\n",
    "1. [介绍](#intro)\n",
    "1. [创建活动](#create)\n",
    "1. [与活动互动](#interact)\n",
    "1. [批量推荐](#batch)\n",
    "1. [小结](#wrapup)\n",
    "\n",
    "## 介绍 <a class=\"anchor\" id=\"intro\"></a>\n",
    "[回到顶部](#top)\n",
    "\n",
    "此时，您应有若干解决方案，每个解决方案至少有一个版本。一旦创建了解决方案版本，即可获得推荐，并了解其整体行为。\n",
    "\n",
    "首先，此 Notebook 将之前 Notebook 中的每个解决方案版本部署到单个活动中。一旦它们处于活动状态，就有资源来响应推荐，并通过帮助函数将输出转化成更人性化的内容。\n",
    "\n",
    "当您与客户一起使用Amazon Personalize时，您可以修改帮助函数以适应其数据输入文件的结构，从而保证其他工作正常执行。\n",
    "\n",
    "要重新开始，我们需要导入库、加载以前 Notebook 的值及 SDK。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "from time import sleep\n",
    "import json\n",
    "from datetime import datetime\n",
    "import uuid\n",
    "import random\n",
    "\n",
    "import boto3\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store -r"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "personalize = boto3.client('personalize')\n",
    "personalize_runtime = boto3.client('personalize-runtime')\n",
    "\n",
    "# 建立与 Personalize 事件流的连接\n",
    "personalize_events = boto3.client(service_name='personalize-events')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 与活动互动 <a class=\"anchor\" id=\"interact\"></a>\n",
    "[回到顶部](#top)\n",
    "\n",
    "现在，所有活动都已部署并处于活动状态，我们可以开始通过 API 调用获得推荐。每个活动都基于不同的配方，其行为方式略有不同，因为它们服务于不同的使用案例。我们将以不同于以前 Notebook 的顺序覆盖每个活动，以便按照从易到难的顺序处理可能的复杂性（即最简单的优先）。\n",
    "\n",
    "首先，让我们创建一个支持函数，以帮助理解个性化活动返回的结果。个性化仅返回 \"item_id\"，这非常适合保持数据紧凑，但这意味着您需要查询数据库或查找表才能获得 Notebook 的个性化结果。我们将创建一个帮助函数，以返回 LastFM 数据集中人类可读的结果。\n",
    "\n",
    "首先加载数据集，我们可以使用这些数据集查找表。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 通过读取正确的源 CSV 为项目创建 dataframe\n",
    "items_df = pd.read_csv(dataset_dir + '/movies.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': \"object\", 'title': \"str\"},index_col=0)\n",
    "\n",
    "# 显示一些示例数据\n",
    "items_df.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过将 ID 列定义为索引列，只需查询 ID 即可返回艺术家信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "movie_id_example = 589\n",
    "title = items_df.loc[movie_id_example]['title']\n",
    "print(title)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这并不可怕，但在我们的代码中到处重复这个会变得很混乱，所以下面的函数会清理它。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_movie_by_id(movie_id, movie_df=items_df):\n",
    "    \"\"\"\n",
    "    This takes in an artist_id from Personalize so it will be a string,\n",
    "    converts it to an int, and then does a lookup in a default or specified\n",
    "    dataframe.\n",
    "    \n",
    "    A really broad try/except clause was added in case anything goes wrong.\n",
    "    \n",
    "    Feel free to add more debugging or filtering here to improve results if\n",
    "    you hit an error.\n",
    "    \"\"\"\n",
    "    try:\n",
    "        return movie_df.loc[int(movie_id)]['title']\n",
    "    except:\n",
    "        return \"Error obtaining title\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，让我们测试一些简单的值，以检查我们的错误捕获。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A known good id (The Princess Bride)\n",
    "print(get_movie_by_id(movie_id=\"1197\"))\n",
    "# A bad type of value\n",
    "print(get_movie_by_id(movie_id=\"987.9393939\"))\n",
    "# Really bad values\n",
    "print(get_movie_by_id(movie_id=\"Steve\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "真棒！现在我们有了一种呈现结果的方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### SIMS\n",
    "\n",
    "SIMS 只需要一个项目作为输入，它将返回用户以类似方式与输入项目交互的项目。在此特定情况下，该项目是一部电影。 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = sims_campaign_arn,\n",
    "    itemId = str(589),\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "item_list = get_recommendations_response['itemList']\n",
    "for item in item_list:\n",
    "    print(get_movie_by_id(movie_id=item['itemId']))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "恭喜您，这是你的第一个推荐列表！此列表很好，但最好有一个漂亮的 dataframe 来查看我们艺术家样本集的推荐。同样，让我们创建一个帮助函数来实现。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 更新 DF 绘制\n",
    "pd.set_option('display.max_rows', 30)\n",
    "\n",
    "def get_new_recommendations_df(recommendations_df, movie_ID):\n",
    "    # 获取 Movie 名称\n",
    "    movie_name = get_movie_by_id(movie_ID)\n",
    "    # 获取推荐结果\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = sims_campaign_arn,\n",
    "        itemId = str(movie_ID),\n",
    "    )\n",
    "    # 构建新的推荐的 Dataframe\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        movie = get_movie_by_id(item['itemId'])\n",
    "        recommendation_list.append(movie)\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [movie_name])\n",
    "    # 将此 Dataframe 加入到旧的 Dataframe 中。\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，让我们用几部不同的电影来测试帮助函数的功能。让我们从数据集中抽取一些数据，以测试我们的 SIMS 活动。从我们的 Dataframe 中获取5个随机电影。\n",
    "\n",
    "注意：我们将显示类似的标题，因此您可能需要重新运行样本，直到您识别列出的一些电影。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "samples = items_df.sample(5)\n",
    "samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sims_recommendations_df = pd.DataFrame()\n",
    "movies = samples.index.tolist()\n",
    "\n",
    "for movie in movies:\n",
    "    sims_recommendations_df = get_new_recommendations_df(sims_recommendations_df, movie)\n",
    "\n",
    "sims_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "您可能会注意到，很多项目看起来一样，希望不是所有的项目都这样（这更有可能跟较少的互动数量相关，这在 Movielens 小数据集上很常见）。这表明，评估指标不应是您在评估解决方案版本时唯一依赖的指标。因此，当这种情况发生时，您可以做些什么来改进结果？\n",
    "\n",
    "这是一个思考个性化配方的超参数的好时机。SIMS 配方具有\"popularity_discount_factor\"超参数（参见 [文档](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html)。利用这个超参数可以让您控制您在结果中看到的细微差别。此参数及其行为对于您遇到的每个数据集都是唯一的，并且取决于业务的目标。您可以重复此超参数值，直到您对结果感到满意，或者您可以从利用个性化的超参数优化 （HPO） 功能开始。有关超参数和 HPO 调优的更多信息，请参阅 [文档](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html) 。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 用户个性化\n",
    "\n",
    "User-Personlization 是 Amazon Personalize 提供的更高级的算法之一。它支持根据特定用户过去的行为对项目进行个性化处理，并可以接收实时事件，以便无需再训练即可为用户更新推荐。\n",
    "\n",
    "由于 User-Personlization 依赖于对用户进行抽样，让我们加载所需的数据，并选择 3 个随机用户。由于 Movielens 不包括用户数据，我们将从数据集中的用户 ID 范围中选择 3 个随机数字。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not USE_FULL_MOVIELENS:\n",
    "    users = random.sample(range(1, 600), 3)\n",
    "else:\n",
    "    users = random.sample(range(1, 162000), 3)\n",
    "users"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，我们为上面挑选的3个随机用户进行推荐。之后，我们将探索实时互动，然后再进入个性化排名。\n",
    "\n",
    "同样，我们创建一个帮助函数，在一个漂亮的 Dataframe 中呈现结果。\n",
    "\n",
    "#### API 调用结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 更新 DF 输出\n",
    "pd.set_option('display.max_rows', 30)\n",
    "\n",
    "def get_new_recommendations_df_users(recommendations_df, user_id):\n",
    "    # 获取电影名称\n",
    "    #movie_name = get_movie_by_id(artist_ID)\n",
    "    # 获取推荐\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = userpersonalization_campaign_arn,\n",
    "        userId = str(user_id),\n",
    "    )\n",
    "    # 为推荐构建新的 Dataframe\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        movie = get_movie_by_id(item['itemId'])\n",
    "        recommendation_list.append(movie)\n",
    "    #print(recommendation_list)\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [user_id])\n",
    "    # 将该 Dataframe 添加到旧的 Dataframe 上。\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommendations_df_users = pd.DataFrame()\n",
    "#users = users_df.sample(3).index.tolist()\n",
    "\n",
    "for user in users:\n",
    "    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)\n",
    "\n",
    "recommendations_df_users"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在这里，我们清楚地看到，每个用户的推荐是不同的。如果您需要这些结果的缓存，您可以首先通过所有用户运行 API 调用并存储结果，或者您可以使用批量导出，稍后该批量导出将包含在此 Notebook 中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，让我们应用项目过滤器，看看其中一个用户在某个类型中的推荐结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_new_recommendations_df_by_filter(recommendations_df, user_id, filter_arn):\n",
    "    # Get the movie name\n",
    "    #movie_name = get_movie_by_id(artist_ID)\n",
    "    # Get the recommendations\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = userpersonalization_campaign_arn,\n",
    "        userId = str(user_id),\n",
    "        filterArn = filter_arn\n",
    "    )\n",
    "    # Build a new dataframe of recommendations\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        movie = get_movie_by_id(item['itemId'])\n",
    "        recommendation_list.append(movie)\n",
    "    #print(recommendation_list)\n",
    "    filter_name = filter_arn.split('/')[1]\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [filter_name])\n",
    "    # Add this dataframe to the old one\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "您可以在给定类型中查看推荐的电影。在 VOD 应用程序中，您可以使用这些滤镜轻松创建陈列（也称为导航条或滚动条）。根据您掌握的有关项目的信息，您还可以过滤其他信息，如关键字、年/10年等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommendations_df_shelves = pd.DataFrame()\n",
    "for filter_arn in meta_filter_arns:\n",
    "    recommendations_df_shelves = get_new_recommendations_df_by_filter(recommendations_df_shelves, user, filter_arn)\n",
    "for filter_arn in decade_filter_arns:\n",
    "    recommendations_df_shelves = get_new_recommendations_df_by_filter(recommendations_df_shelves, user, filter_arn)\n",
    "\n",
    "recommendations_df_shelves"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下一个主题是实时事件。Amazon Personalize 能够监听应用程序中的事件，以便更新显示给用户的推荐结果。这在媒体行业（如视频点播）中尤其有用，因为客户的意图可能会因客户与孩子一起观看或独自观看而有所不同。\n",
    "\n",
    "此外，通过此系统记录的事件将一直存储，直到您发出删除调用为止，并且它们与您在训练下一个模型时提供的其他交互数据一起用作历史数据。\n",
    "\n",
    "#### 实时事件\n",
    "\n",
    "首先创建附加到活动的事件跟踪器。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "response = personalize.create_event_tracker(\n",
    "    name='MovieTracker',\n",
    "    datasetGroupArn=dataset_group_arn\n",
    ")\n",
    "print(response['eventTrackerArn'])\n",
    "print(response['trackingId'])\n",
    "TRACKING_ID = response['trackingId']\n",
    "event_tracker_arn = response['eventTrackerArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们将创建一些模拟用户与特定项目交互的代码。运行此代码后，您将收到与上述结果不同的建议。\n",
    "\n",
    "我们首先创建一些实时事件模拟方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session_dict = {}\n",
    "\n",
    "def send_movie_click(USER_ID, ITEM_ID, EVENT_TYPE):\n",
    "    \"\"\"\n",
    "    Simulates a click as an envent\n",
    "    to send an event to Amazon Personalize's Event Tracker\n",
    "    \"\"\"\n",
    "    # Configure Session\n",
    "    try:\n",
    "        session_ID = session_dict[str(USER_ID)]\n",
    "    except:\n",
    "        session_dict[str(USER_ID)] = str(uuid.uuid1())\n",
    "        session_ID = session_dict[str(USER_ID)]\n",
    "        \n",
    "    # Configure Properties:\n",
    "    event = {\n",
    "    \"itemId\": str(ITEM_ID),\n",
    "    }\n",
    "    event_json = json.dumps(event)\n",
    "        \n",
    "    # Make Call\n",
    "    \n",
    "    personalize_events.put_events(\n",
    "    trackingId = TRACKING_ID,\n",
    "    userId= str(USER_ID),\n",
    "    sessionId = session_ID,\n",
    "    eventList = [{\n",
    "        'sentAt': int(time.time()),\n",
    "        'eventType': str(EVENT_TYPE),\n",
    "        'properties': event_json\n",
    "        }]\n",
    "    )\n",
    "\n",
    "def get_new_recommendations_df_users_real_time(recommendations_df, user_id, item_id, event_type):\n",
    "    # Get the artist name (header of column)\n",
    "    movie_name = get_movie_by_id(item_id)\n",
    "    # Interact with different movies\n",
    "    print('sending event ' + event_type + ' for ' + get_movie_by_id(item_id))\n",
    "    send_movie_click(USER_ID=user_id, ITEM_ID=item_id, EVENT_TYPE=event_type)\n",
    "    # Get the recommendations (note you should have a base recommendation DF created before)\n",
    "    get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = userpersonalization_campaign_arn,\n",
    "        userId = str(user_id),\n",
    "    )\n",
    "    # Build a new dataframe of recommendations\n",
    "    item_list = get_recommendations_response['itemList']\n",
    "    recommendation_list = []\n",
    "    for item in item_list:\n",
    "        artist = get_movie_by_id(item['itemId'])\n",
    "        recommendation_list.append(artist)\n",
    "    new_rec_DF = pd.DataFrame(recommendation_list, columns = [movie_name])\n",
    "    # Add this dataframe to the old one\n",
    "    #recommendations_df = recommendations_df.join(new_rec_DF)\n",
    "    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)\n",
    "    return recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时，我们尚未生成任何实时事件：我们只设置了代码。要比较实时事件之前和之后的推荐结果，让我们选择一个用户并为其生成原始推荐结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 首先选择一个用户\n",
    "user_id = user\n",
    "\n",
    "# 为该用户获得推荐\n",
    "get_recommendations_response = personalize_runtime.get_recommendations(\n",
    "        campaignArn = userpersonalization_campaign_arn,\n",
    "        userId = str(user_id),\n",
    "    )\n",
    "\n",
    "# 为推荐列表构建新的 Dataframe\n",
    "item_list = get_recommendations_response['itemList']\n",
    "recommendation_list = []\n",
    "for item in item_list:\n",
    "    artist = get_movie_by_id(item['itemId'])\n",
    "    recommendation_list.append(artist)\n",
    "user_recommendations_df = pd.DataFrame(recommendation_list, columns = [user_id])\n",
    "user_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "好了，在应用任何实时事件之前，我们现在有了一个给该用户的推荐列表。现在，让我们随机选择3个艺术家，模拟用户互动，然后看看这如何改变推荐结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 下一步生成3部随机电影\n",
    "movies = items_df.sample(3).index.tolist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 请注意，由于休眠的原因，这大约需要15秒才能完成\n",
    "for movie in movies:\n",
    "    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, user_id, movie,'click')\n",
    "    time.sleep(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，我们可以查看点击事件如何更改推荐结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "user_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在上面的单元格中，索引后的第一列是用户个性化中的默认推荐，之后的每个列都有他们通过实时事件与艺术家互动的标题，以及此事件发生后的推荐。\n",
    "\n",
    "行为可能不会有太大的改变：这是由于此数据集规模有限，并且仅有几次随机点击。如果您想更好地了解这一点，请尝试模拟点击更多的电影，您应该能看到更明显的影响。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在让我们看看事件过滤器，它允许您根据交互数据筛选项目。对于此数据集，它可以根据我们导入的数据进行单击或观看，但可能基于您设计的任何交互模式（单击、评分、观看、购买等）。对于 VOD 场景，您可以将标题从\"为您挑选的热门产品\"移动到\"再次观看\"，再次推荐的手表将基于用户当前的交互，但仅推荐已观看过的标题。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommendations_df_events = pd.DataFrame()\n",
    "for filter_arn in interaction_filter_arns:\n",
    "    recommendations_df_events = get_new_recommendations_df_by_filter(recommendations_df_events, user, filter_arn)\n",
    "    \n",
    "recommendations_df_events"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在让我们针对4个未观看的推荐发送一个观看事件，这将模拟看4部电影。在 VOD 应用程序中，您可能会选择在观看某一内容的阈值超过一定量（如超过 75%）后发送事件。如果将阈值设得过高，如 100% 完成才发送，就可能会错过一部分事件。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    " # Get the recommendations\n",
    "top_unwatched_recommendations_response = personalize_runtime.get_recommendations(\n",
    "    campaignArn = userpersonalization_campaign_arn,\n",
    "    userId = str(user_id),\n",
    "    filterArn = filter_arn,\n",
    "    numResults=4)\n",
    "item_list = top_unwatched_recommendations_response['itemList']\n",
    "for item in item_list:\n",
    "    print('sending event watch for ' + get_movie_by_id(item['itemId']))\n",
    "    send_movie_click(USER_ID=user_id, ITEM_ID=item['itemId'], EVENT_TYPE='watch')\n",
    "    time.sleep(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，我们可以查看事件过滤器，查看更新的观看和未观看的推荐。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recommendations_df_events = pd.DataFrame()\n",
    "for filter_arn in interaction_filter_arns:\n",
    "    recommendations_df_events = get_new_recommendations_df_by_filter(recommendations_df_events, user, filter_arn)\n",
    "    \n",
    "recommendations_df_events"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 个性化排名\n",
    "\n",
    "个性化排名的核心用例是收集项目，并将它们按用户优先级或可能感兴趣的顺序呈现。对于 VOD 应用程序，您需要根据一些信息动态渲染个性化栏目/布告栏/滚动信息（导演、位置、超级英雄专栏、电影时段等）。这可能不是您元数据中的信息，因此项目元数据筛选器将无法工作，但是您可能在系统中包含此信息以生成项目列表。\n",
    "\n",
    "为了证明这一点，我们将使用与以前相同的用户和项目随机集合。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rerank_user = user\n",
    "rerank_items = items_df.sample(25).index.tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在构建一个显示输入数据的漂亮 Dataframe。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rerank_list = []\n",
    "for item in rerank_items:\n",
    "    movie = get_movie_by_id(item)\n",
    "    rerank_list.append(movie)\n",
    "rerank_df = pd.DataFrame(rerank_list, columns = ['Un-Ranked'])\n",
    "rerank_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后进行个性化排名 API 调用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将用户转换为字符串:\n",
    "user_id = str(rerank_user)\n",
    "rerank_item_list = []\n",
    "for item in rerank_items:\n",
    "    rerank_item_list.append(str(item))\n",
    "    \n",
    "# 获得推荐的重新排序\n",
    "get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(\n",
    "        campaignArn = rerank_campaign_arn,\n",
    "        userId = user_id,\n",
    "        inputList = rerank_item_list\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，将重新排名的项目作为第二列添加到原始 Dataframe 中，以进行并排比较。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ranked_list = []\n",
    "item_list = get_recommendations_response_rerank['personalizedRanking']\n",
    "for item in item_list:\n",
    "    movie = get_movie_by_id(item['itemId'])\n",
    "    ranked_list.append(movie)\n",
    "ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])\n",
    "rerank_df = pd.concat([rerank_df, ranked_df], axis=1)\n",
    "rerank_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据模型对用户的理解，您可以在上面看到每个条目是如何重新排序的。当您有一个需要向用户推荐的项目集合时，这是一个受欢迎的任务，如促销列表。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 批量推荐 <a class=\"anchor\" id=\"batch\"></a>\n",
    "[回到顶部](#top)\n",
    "\n",
    "在许多情况下，您可能希望导出更大的推荐数据集。Amazon Personalize 推出了批量推荐，作为导出到S3的一系列推荐的一种方式。在此示例中，我们将了解如何为 User-Personalization 解决方案做到这一点。有关批量推荐的更多信息，请参阅 [文档](https://docs.aws.amazon.com/personalize/latest/dg/getting-recommendations.html#recommendations-batch)。此功能适用于所有算法配方，但输出格式会有所不同。\n",
    "\n",
    "简单的实现看起来像这样：\n",
    "\n",
    "```python\n",
    "import boto3\n",
    "\n",
    "personalize_rec = boto3.client(service_name='personalize')\n",
    "\n",
    "personalize_rec.create_batch_inference_job (\n",
    "    solutionVersionArn = \"Solution version ARN\",\n",
    "    jobName = \"Batch job name\",\n",
    "    roleArn = \"IAM role ARN\",\n",
    "    jobInput = \n",
    "       {\"s3DataSource\": {\"path\": S3 input path}},\n",
    "    jobOutput = \n",
    "       {\"s3DataDestination\": {\"path\":S3 output path\"}}\n",
    ")\n",
    "```\n",
    "\n",
    "SDK 导入、解决方案版本 arn 和角色 arn 都已确定，仅留下输入、输出和工作名称供定义。\n",
    "\n",
    "从 HRNN 的输入开始，它看起来像：\n",
    "\n",
    "```JSON\n",
    "{\"userId\": \"4638\"}\n",
    "{\"userId\": \"663\"}\n",
    "{\"userId\": \"3384\"}\n",
    "```\n",
    "\n",
    "这应该产生这样的输出：\n",
    "\n",
    "```JSON\n",
    "{\"input\":{\"userId\":\"4638\"}, \"output\": {\"recommendedItems\": [\"296\", \"1\", \"260\", \"318\"]}}\n",
    "{\"input\":{\"userId\":\"663\"}, \"output\": {\"recommendedItems\": [\"1393\", \"3793\", \"2701\", \"3826\"]}}\n",
    "{\"input\":{\"userId\":\"3384\"}, \"output\": {\"recommendedItems\": [\"8368\", \"5989\", \"40815\", \"48780\"]}}\n",
    "```\n",
    "\n",
    "输出是 JSON 行式文件。它由单个 JSON 对象组成，每行一个。因此，我们以后需要投入更多的工作来消化这种格式的结果。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 构建输入文件\n",
    "\n",
    "当您使用批处理功能时，您指定了希望在工作完成时收到推荐的用户。下面的单元格将再次选择几个随机用户，然后构建文件并将其保存到磁盘中。从那里，您将上传到 S3，稍后在 API 调用中使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 我们将使用以前相同的用户\n",
    "users\n",
    "# 将文件写入磁盘\n",
    "json_input_filename = \"json_input.json\"\n",
    "with open(data_dir + \"/\" + json_input_filename, 'w') as json_input:\n",
    "    for user_id in users:\n",
    "        json_input.write('{\"userId\": \"' + str(user_id) + '\"}\\n')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 展示输入文件：\n",
    "!cat $data_dir\"/\"$json_input_filename"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将文件上传到 S3，并将路径保存为变量供以后使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将文件上传到 S3\n",
    "boto3.Session().resource('s3').Bucket(bucket_name).Object(json_input_filename).upload_file(data_dir+\"/\"+json_input_filename)\n",
    "s3_input_path = \"s3://\" + bucket_name + \"/\" + json_input_filename\n",
    "print(s3_input_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "批量推荐读取我们上传到 S3 的文件中的输入。同样，批量推荐将节省在 S3 中归档的输出。因此，我们定义了应保存结果的输出路径。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 定义输出路径\n",
    "s3_output_path = \"s3://\" + bucket_name + \"/\"\n",
    "print(s3_output_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在只需 API 调用触发批量输出过程。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "batchInferenceJobArn = personalize.create_batch_inference_job (\n",
    "    solutionVersionArn = userpersonalization_solution_version_arn,\n",
    "    jobName = \"VOD-POC-Batch-Inference-Job-UserPersonalization_\" + str(round(time.time()*1000)),\n",
    "    roleArn = role_arn,\n",
    "    jobInput = \n",
    "     {\"s3DataSource\": {\"path\": s3_input_path}},\n",
    "    jobOutput = \n",
    "     {\"s3DataDestination\":{\"path\": s3_output_path}}\n",
    ")\n",
    "batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行下面的 While 循环以跟踪批量推荐调用的状态。因为个性化推荐需要启动执行任务的基础设施，这需要大约 30 分钟才能完成。我们正在测试该功能的数据集只有 3 个用户，这只是演示之用。通常，您会使用此功能进行大批量处理以凸显效率。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "current_time = datetime.now()\n",
    "print(\"Import Started on: \", current_time.strftime(\"%I:%M:%S %p\"))\n",
    "\n",
    "max_time = time.time() + 6*60*60 # 6 hours\n",
    "while time.time() < max_time:\n",
    "    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(\n",
    "        batchInferenceJobArn = batchInferenceJobArn\n",
    "    )\n",
    "    status = describe_dataset_inference_job_response[\"batchInferenceJob\"]['status']\n",
    "    print(\"DatasetInferenceJob: {}\".format(status))\n",
    "    \n",
    "    if status == \"ACTIVE\" or status == \"CREATE FAILED\":\n",
    "        break\n",
    "        \n",
    "    time.sleep(60)\n",
    "    \n",
    "current_time = datetime.now()\n",
    "print(\"Import Completed on: \", current_time.strftime(\"%I:%M:%S %p\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "s3 = boto3.client('s3')\n",
    "export_name = json_input_filename + \".out\"\n",
    "s3.download_file(bucket_name, export_name, data_dir+\"/\"+export_name)\n",
    "\n",
    "# Update DF rendering\n",
    "pd.set_option('display.max_rows', 30)\n",
    "with open(data_dir+\"/\"+export_name) as json_file:\n",
    "    # Get the first line and parse it\n",
    "    line = json.loads(json_file.readline())\n",
    "    # Do the same for the other lines\n",
    "    while line:\n",
    "        # extract the user ID \n",
    "        col_header = \"User: \" + line['input']['userId']\n",
    "        # Create a list for all the artists\n",
    "        recommendation_list = []\n",
    "        # Add all the entries\n",
    "        for item in line['output']['recommendedItems']:\n",
    "            movie = get_movie_by_id(item)\n",
    "            recommendation_list.append(movie)\n",
    "        if 'bulk_recommendations_df' in locals():\n",
    "            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])\n",
    "            bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)\n",
    "        else:\n",
    "            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])\n",
    "        try:\n",
    "            line = json.loads(json_file.readline())\n",
    "        except:\n",
    "            line = None\n",
    "bulk_recommendations_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 结束语 <a class=\"anchor\" id=\"wrapup\"></a>\n",
    "[回到顶部](#top)\n",
    "\n",
    "综上所述，您现在有一个全面工作的模型集合，以解决各种推荐和个性化场景，以及处理客户数据以更好地与服务集成的技能，以及如何通过 API 和利用开源数据科学工具实现这一切的知识。\n",
    "\n",
    "使用这些 Notebook 作为推进客户 POC 的指南。当您找到缺少的组件或发现新方法时，请获取代码并增加此集合中可能缺少的任何其他有帮助的组件。\n",
    "\n",
    "您需要确保清理在此 POC 期间部署的所有资源。我们提供了一个单独的 Notebook，向您展示如何识别和删除资源 `06_Clean_Up_Resources.ipynb`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%store event_tracker_arn\n",
    "%store batchInferenceJobArn"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_python3",
   "language": "python",
   "name": "conda_python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
